Skip to content

Conversation

@santhoshbethi
Copy link
Contributor

@santhoshbethi santhoshbethi commented Oct 15, 2025

Comment on lines 1573 to 1576
if not self._null_fill_value:
if isinstance(self, SparseArray) and self.fill_value == 0:
# special case where we can avoid max recursion depth
return SparseArray(self.to_dense().cumsum())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not self._null_fill_value:
if isinstance(self, SparseArray) and self.fill_value == 0:
# special case where we can avoid max recursion depth
return SparseArray(self.to_dense().cumsum())
if not self._null_fill_value and self.fill_value != 0:

Copy link
Member

@rhshadrach rhshadrach Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now this was wrong, but the previous implementation also did not handle the case

arr = pd.arrays.SparseArray([1.0, 0.0, np.nan, 3.0], fill_value=0.0)

correctly as it would have given the result [1.0, 1.0, nan, nan] whereas I believe docstring indicates the result should be [1.0, 1.0, nan, 4.0]. We can accomplish this by:

if not self._null_fill_value:
    return SparseArray(self.to_dense(), fill_value=np.nan).cumsum()

Note this changes the fill_value to NaN, but that adheres to the docstring. Anytime this method has been successful in previous versions, it was always returning an NA fill value regardless of the input fill value. So I think we should stick to that logic at least for now.

Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
Comment on lines 1573 to 1576
if not self._null_fill_value:
if isinstance(self, SparseArray) and self.fill_value == 0:
# special case where we can avoid max recursion depth
return SparseArray(self.to_dense().cumsum())
Copy link
Member

@rhshadrach rhshadrach Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now this was wrong, but the previous implementation also did not handle the case

arr = pd.arrays.SparseArray([1.0, 0.0, np.nan, 3.0], fill_value=0.0)

correctly as it would have given the result [1.0, 1.0, nan, nan] whereas I believe docstring indicates the result should be [1.0, 1.0, nan, 4.0]. We can accomplish this by:

if not self._null_fill_value:
    return SparseArray(self.to_dense(), fill_value=np.nan).cumsum()

Note this changes the fill_value to NaN, but that adheres to the docstring. Anytime this method has been successful in previous versions, it was always returning an NA fill value regardless of the input fill value. So I think we should stick to that logic at least for now.

@rhshadrach rhshadrach added Bug Sparse Sparse Data Type labels Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Sparse Sparse Data Type

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Calling SparseArray.cumsum leads to infinite recursion

4 participants